多分类同样有错误率、准确率
$$ \begin{align*} \quad E_D (f) = \frac{1}{m} \sum_{i \in [m]} \Ibb (y_i \ne f(\xv_i)), \quad \text{Acc}_D (f) = 1 - E_D (f) \end{align*} $$
以及混淆矩阵
| 预测第$1$类 | 预测第$2$类 | ... | 预测第$c$类 | |
|---|---|---|---|---|
| 真实第$1$类 | ||||
| 真实第$2$类 | ||||
| ... | ||||
| 真实第$c$类 |
错误率 (0-1 损失) 不连续、难优化,通常采用交叉熵损失
设$c$个类别的预测函数分别为$f_1, \ldots, f_c$,则样本$x$的预测结果为
$$ \begin{align*} \quad \pv = \left[ \frac{e^{f_1(x)}}{\sum_{j \in [c]} e^{f_i(x)}}, \frac{e^{f_2(x)}}{\sum_{j \in [c]} e^{f_i(x)}}, \ldots, \frac{e^{f_c(x)}}{\sum_{j \in [c]} e^{f_i(x)}} \right] \quad \longleftarrow \text{softmax} \end{align*} $$
这是一个$c$维向量,同时也是一个离散概率分布
类标记$y$可转化为独热编码$\ev_y$,这也是一个$c$维离散概率分布
替代损失的要求:关于$\pv$、$\ev_y$连续,且$\pv$、$\ev_y$越接近损失越小
问题:给定离散概率分布$\qv$,如何度量分布$\pv$与它的距离?
交叉熵 (cross-entropy) $H_{\qv} (\pv) \triangleq - \sum_i q_i \ln p_i$
当$\pv = \qv$时交叉熵最小,此时交叉熵$H_{\qv} (\pv)$即为分布$\qv$的熵$H(\qv)$
$$ \begin{align*} \quad \min_{\pv} H_{\qv} (\pv) = - \sum_i q_i \ln p_i, \quad \st ~ \sum_i p_i = 1 \end{align*} $$
拉格朗日函数为$L(p_i, \alpha) = - \sum_i q_i \ln p_i + \alpha (\sum_i p_i - 1)$,于是
$$ \begin{align*} \quad \nabla_{p_i} L(p_i, \alpha) & = - \frac{q_i}{p_i} + \alpha = 0 \Longrightarrow q_i = \alpha p_i \\ & \Longrightarrow \sum_i q_i = \alpha \sum_i p_i \Longrightarrow \alpha = 1 \Longrightarrow \pv = \qv \end{align*} $$
对$(x,y)$,$y \in [c]$,交叉熵损失为$- \ln \frac{e^{f_y(x)}}{\sum_{j \in [c]} e^{f_i(x)}}$
对$(x,y)$,$y \in \{1, -1\}$,$\qv = [(1+y)/2; (1-y)/2]$,交叉熵损失为
$$ \begin{align*} \quad \text{CE} & = - \frac{1+y}{2} \ln \frac{e^{f_1(x)}}{e^{f_1(x)}+e^{f_2(x)}} - \frac{1-y}{2} \ln \frac{e^{f_2(x)}}{e^{f_1(x)}+e^{f_2(x)}} \\ & = - \frac{1+y}{2} \ln \frac{e^{f_1(x)-f_2(x)}}{e^{f_1(x)-f_2(x)}+1} - \frac{1-y}{2} \ln \frac{1}{e^{f_1(x)-f_2(x)}+1} \\ & = - \frac{1+y}{2} \ln \frac{e^{w(x)}}{e^{w(x)}+1} - \frac{1-y}{2} \ln \frac{1}{e^{w(x)}+1} \quad \leftarrow w(x) \triangleq f_1(x)-f_2(x) \\ & = \begin{cases} \ln (1 + e^{-w(x)}), & y = 1 \\ \ln (1 + e^{-w(x)}), & y = -1 \end{cases} \\ & = \ln (1 + e^{- y w(x)}) \end{align*} $$
由此可见,多分类的交叉熵损失就是二分类的对率损失的拓展
原始数据:表格、图片、视频、文本、语音、……
模型学习:最核心的部分,学习一个用来预测的映射
特征工程:
from sklearn.datasets import load_breast_cancer breast_cancer = load_breast_cancer() print(breast_cancer.DESCR) # -------------------- # **Data Set Characteristics:** # # :Number of Instances: 569 # # :Number of Attributes: 30 numeric, predictive attributes and the class # # :Attribute Information: # - radius (mean of distances from center to points on the perimeter) # - texture (standard deviation of gray-scale values) # - perimeter # - area # - smoothness (local variation in radius lengths) # - compactness (perimeter^2 / area - 1.0) # - concavity (severity of concave portions of the contour) # - concave points (number of concave portions of the contour) # - symmetry # - fractal dimension ("coastline approximation" - 1) # # The mean, standard error, and "worst" or largest (mean of the three # worst/largest values) of these features were computed for each image, # resulting in 30 features. For instance, field 0 is Mean Radius, field # 10 is Radius SE, field 20 is Worst Radius. # # - class: # - WDBC-Malignant 恶性 # - WDBC-Benign 良性 # # :Summary Statistics: # # ===================================== ====== ====== # Min Max # ===================================== ====== ====== # radius (mean): 6.981 28.11 半径 # texture (mean): 9.71 39.28 纹理 # perimeter (mean): 43.79 188.5 周长 # area (mean): 143.5 2501.0 面积 # smoothness (mean): 0.053 0.163 平滑度 # compactness (mean): 0.019 0.345 紧凑度 # concavity (mean): 0.0 0.427 凹度 # concave points (mean): 0.0 0.201 凹点 # symmetry (mean): 0.106 0.304 对称性 # fractal dimension (mean): 0.05 0.097 分形维数 # radius (standard error): 0.112 2.873 # texture (standard error): 0.36 4.885 # perimeter (standard error): 0.757 21.98 # area (standard error): 6.802 542.2 # smoothness (standard error): 0.002 0.031 # compactness (standard error): 0.002 0.135 # concavity (standard error): 0.0 0.396 # concave points (standard error): 0.0 0.053 # symmetry (standard error): 0.008 0.079 # fractal dimension (standard error): 0.001 0.03 # radius (worst): 7.93 36.04 # texture (worst): 12.02 49.54 # perimeter (worst): 50.41 251.2 # area (worst): 185.2 4254.0 # smoothness (worst): 0.071 0.223 # compactness (worst): 0.027 1.058 # concavity (worst): 0.0 1.252 # concave points (worst): 0.0 0.291 # symmetry (worst): 0.156 0.664 # fractal dimension (worst): 0.055 0.208 # ===================================== ====== ====== # # :Missing Attribute Values: None # # :Class Distribution: 212 - Malignant, 357 - Benign # # :Creator: Dr. William H. Wolberg, W. Nick Street, Olvi L. Mangasarian # # :Donor: Nick Street # # :Date: November, 1995 # # This is a copy of UCI ML Breast Cancer Wisconsin (Diagnostic) datasets. # https://goo.gl/U2Uwz2 # # # # Separating plane described above was obtained using # Multisurface Method-Tree (MSM-T) [K. P. Bennett, "Decision Tree # Construction Via Linear Programming." Proceedings of the 4th # Midwest Artificial Intelligence and Cognitive Science Society, # pp. 97-101, 1992], a classification method which uses linear # programming to construct a decision tree. Relevant features # were selected using an exhaustive search in the space of 1-4 # features and 1-3 separating planes. # # The actual linear program used to obtain the separating plane # in the 3-dimensional space is that described in: # [K. P. Bennett and O. L. Mangasarian: "Robust Linear # Programming Discrimination of Two Linearly Inseparable Sets", # Optimization Methods and Software 1, 1992, 23-34]. # # This database is also available through the UW CS ftp server: # # ftp ftp.cs.wisc.edu # cd math-prog/cpo-dataset/machine-learn/WDBC/ # # .. dropdown:: References # # - W.N. Street, W.H. Wolberg and O.L. Mangasarian. Nuclear feature extraction # for breast tumor diagnosis. IS&T/SPIE 1993 International Symposium on # Electronic Imaging: Science and Technology, volume 1905, pages 861-870, # San Jose, CA, 1993. # - O.L. Mangasarian, W.N. Street and W.H. Wolberg. Breast cancer diagnosis and # prognosis via linear programming. Operations Research, 43(4), pages 570-577, # July-August 1995. # - W.H. Wolberg, W.N. Street, and O.L. Mangasarian. Machine learning techniques # to diagnose breast cancer from fine-needle aspirates. Cancer Letters 77 (1994) # 163-171.